Individualized estimation of arterial carbon dioxide partial pressure using machine learning in children receiving mechanical ventilation

Background Measuring arterial partial pressure of carbon dioxide (PaCO2) is crucial for proper mechanical ventilation, but the current sampling method is invasive. End-tidal carbon dioxide (EtCO2) has been used as a surrogate, which can be measured non-invasively, but its limited accuracy is due to ventilation-perfusion mismatch. This study aimed to develop a non-invasive PaCO2 estimation model using machine learning. Methods This retrospective observational study included pediatric patients (< 18 years) admitted to the pediatric intensive care unit of a tertiary children’s hospital and received mechanical ventilation between January 2021 and June 2022. Clinical information, including mechanical ventilation parameters and laboratory test results, was used for machine learning. Linear regression, multilayer perceptron, and extreme gradient boosting were implemented. The dataset was divided into 7:3 ratios for training and testing. Model performance was assessed using the R2 value. Results We analyzed total 2,427 measurements from 32 patients. The median (interquartile range) age was 16 (12−19.5) months, and 74.1% were female. The PaCO2 and EtCO2 were 63 (50−83) mmHg and 43 (35−54) mmHg, respectively. A significant discrepancy of 19 (12–31) mmHg existed between EtCO2 and the measured PaCO2. The R2 coefficient of determination for the developed models was 0.799 for the linear regression model, 0.851 for the multilayer perceptron model, and 0.877 for the extreme gradient boosting model. The correlations with PaCO2 were higher in all three models compared to EtCO2. Conclusions We developed machine learning models to non-invasively estimate PaCO2 in pediatric patients receiving mechanical ventilation, demonstrating acceptable performance. Further research is needed to improve reliability and external validation.


Individualized estimation of arterial carbon dioxide partial pressure using machine learning in children receiving mechanical ventilation
Hye-Ji Han 1 , Bongjin Lee 1,2* and June Dong Park 1

Background
Evaluating adequate oxygen and carbon dioxide (CO 2 ) exchange is crucial for managing patients receiving mechanical ventilation.The arterial partial pressure of CO 2 (PaCO 2 ) is commonly used for evaluation, but its invasive measurement via arterial blood sampling can be painful and cause iatrogenic complications such as mechanical damage of arteries, anemia, embolism, thrombosis, or nerve injury.In general, as the difficulty of the procedure increases in smaller infant, the degree of complications may become more severe [1].Therefore, this highlights the need for a non-invasive method to estimate PaCO 2 in children.
End-tidal CO 2 (EtCO 2 ) and transcutaneous CO 2 (TCO 2 ) monitoring are feasible alternatives.EtCO 2 reflects exhaled CO 2 , but ventilation-perfusion mismatch limits its accuracy as a surrogate for PaCO 2 .Factors like dead-space ventilation, severe atelectasis, and intrapulmonary shunts can influence its reliability in various clinical settings.Therefore, bedside practice often utilizes the EtCO 2 trend to estimate PaCO 2 trends rather than relying on a single value [2][3][4].TCO 2 , measuring the partial pressure of CO 2 in arteriolarized capillaries through skin warming, also has limitations that make it difficult to use it widely in clinical practice, especially in acute medical settings [5,6].
Few studies have attempted to estimate the dead space fraction using the difference between EtCO 2 and PaCO 2 , which was even described as an independent risk factor associated with mortality [7,8].However, developing a robust formula for this purpose remains challenging due to the influence of various physiological factors [7][8][9][10][11][12].Recent research has demonstrated the successful application of artificial intelligence for individualized estimation and prediction of complex factors, overcoming limitations of conventional methods [13][14][15].
This study aimed to develop machine learning models based on non-invasively collected data and compare their performance to estimate PaCO 2 more accurately in mechanically ventilated children.

Study setting and data collection
This retrospective cross-sectional observational study was conducted in the pediatric intensive care unit (PICU) of a university-affiliated children's hospital.Patients aged less than 18 years who were admitted to the PICU and received mechanical ventilation between January 2021 and June 2022 were eligible for inclusion.Patients who underwent extracorporeal membrane oxygenation, which allows oxygenation and CO 2 elimination using methods other than mechanical ventilation, were excluded.Additionally, patients who did not have PaCO This study was conducted in accordance with the principles of the Declaration of Helsinki.The protocol of this study was reviewed by the Institutional Review Board of Seoul National University Hospital.This study was recognized as a minimal risk study by the above committee because it used only pseudonymous information and did not collect personally identifiable information.Therefore, for the above reasons, the above committee waived the need for written informed consent and approved the conduct of this study (approval no.H-2307-160-1452).

Data preprocessing
Among the collected data, cases containing duplicates or missing values were excluded.A limitation of retrospectively collected blood gas analysis results is the inability to definitively differentiate between arterial and venous blood samples.Therefore, we assumed arterial blood when the arterial partial pressure of oxygen (PaO 2 ) was equal to or greater than 40 mmHg, thereby including cases with hypoxemia.Additionally, measurements outside the physiological range, suggesting errors in recording or measurement, were excluded.These criteria included: HR < 30 beats/minute or > 300 beats/ minute, RR < 5 breaths/minute or > 120 breaths/minute, EtCO 2 < 20 mmHg, and FiO 2 < 21%.For clinical information and ventilator parameters, time values were converted to hours by discarding minutes and seconds.Laboratory findings were integrated into a one-hour window.To improve model performance, FiO 2 , which exhibits a known non-linear relationship with PaCO 2 , was log-transformed.Patients' BPs were converted to percentiles based on age, sex, and height [16].Pulse pressure and oxygenation saturation index (OSI) were calculated using the collected variables following the method outlined [17].Data preprocessing was performed using R statistical packages R 4.3.1.

Strategy of analysis
The primary outcome of this study was the estimation performance of the developed models.Performance was compared with the R 2 values and visualized through the Bland-Altman analysis with 95% limits of agreement and scatter plots.The measured PaCO 2 was referred to as the gold standard, and the machine learning-derived estimated CO 2 was served as the comparator.The correlation coefficients between EtCO 2 and the measured PaCO 2 were presented with P-values as a secondary outcome.
Demographics and characteristics were presented as median (interquartile range) and compared using the Wilcoxon signed-rank test for non-normally distributed categorical variables and the chi-square test for other categorical variables.To account for the discrepancy between PaCO 2 and EtCO 2 , the absolute values of PaCO 2 -EtCO 2 were calculated.Correlations among the values were evaluated using Spearman's correlation coefficient.Linear regression analysis of PaCO 2 and EtCO 2 was performed to determine R 2 values.The normality of residuals was assessed using the Shapiro-Wilk test before conducting the Bland-Altman analysis.Statistical analyses were performed using R software version 4.3.1.,P -values less than 0.05 were considered statistically significant, and a confidence level of 95% was chosen for analysis.

Model development and validation
Using collinearity analysis of the classical linear regression and feature importance analysis in the models, the following variables that can be non-invasively retrievable in clinical settings were selected for machine learning: weight, percentile of systolic and diastolic BP, RR, HR, EtCO 2 , minute ventilation, log-transformed FiO 2 , SpO 2 , MAP and PEEP.To minimize the outliers' impact, the variables were scaled according to the quantile range.
To mitigate potential overfitting and selection bias from the varying numbers of individual measurements per patient, each patient's data was split into a 7:3 ratio for training and validation.Machine-learning models were developed using three algorithms: linear regression, multi-layer perceptron (MLP), and extreme gradient boosting (XGB).The final model of multivariate linear regression was built based on the Akaike information criterion, and multicolinearity was assessed using the variance inflation factor.Optimal hyperparameters for MLP and XGB models were extracted through a grid search with five-fold cross-validation [18,19].The MLP model was designed with two hidden layers: 50 nodes in the first and 10 in the second.ReLU activation and Adam solver were used.The XGB model had a maximum decision tree depth of seven, a learning rate of 0.01, and 500 estimators.Feature importance in the XGB model was determined using Shapley additive explanations (SHAP) values, reflecting the impact of various features on the model [20].Machine learning was implemented using Python version 3.10.4(Python Software Foundation, Beaverton, OR, USA; https://www.python.org)and open libraries such as Pandas, Keras, Pytorch, Numpy and Scikit-learn.

Baseline characteristics
From 32 pediatric patients, we collected a total of 2,427 measurements.After applying the exclusion criteria, 1,546 measurements from 30 patients remained for analysis (Fig. 1).Patient demographics and baseline characteristics were presented in Table 1.
The age at admission was 16 (12−19.5)months and female predominance was observed.The OSI indicated mild to moderate acute respiratory distress syndrome (ARDS), and the proportion of patients with OSI ≥ 12, corresponding to severe ARDS, was 31.46%[17].The percentiles of systolic and diastolic BP suggested that most patients were not hypotensive and had BP above the fifth percentile.The majority of patients were hypercapnic and the discrepancy between PaCO 2 and EtCO 2 was 19 (12−31) mmHg.The PaO 2 /FiO 2 ratio was 119.4 (82.6−202.2) mmHg.

Main outcomes
The R 2 values and correlation coefficients for the linear, MLP, and XGB models were summarized in Table 2. Bland-Altman analyses were performed for the models using scaled data.The linear regression model exhibited a mean difference of -0.574 with a 95% limit of agreement (LOA) ranging from − 18.449 to 17.301.The MLP model improved upon this, showing a mean difference of -0.108 with a tighter 95% LOA of -14.107 to 13.892.The XGB model achieved the best performance, with a mean difference of -0.340 and a narrow 95% LOA of -15.716 to 15.036.This is further confirmed by the scatter plot, which demonstrates minimal bias (Fig. 2).
Correlations were estimated using the Spearman correlation coefficient, and R 2 values were calculated to compare the developed models.MLP, multi-layer perceptron; XGB, extreme gradient boosting.
The Shapiro-Wilk test performed on residuals yielded P-values of 0.042, < 0.001, and < 0.001 for the linear regression, MLP, and XGB models, respectively.The developed models presented high correlations with PaCO 2 .In comparison, EtCO 2 and PaCO 2 displayed a correlation coefficient of 0.78 (P < 0.001) and an R 2 value of 0.58, highlighting the superior performance of machine learning models.Notably, the discrepancy between EtCO 2 and PaCO 2 increased with higher PaCO 2 values, and the group with OSI ≥ 12 was more prevalent at a greater discrepancy between EtCO 2 and PaCO 2 (Fig. 3).The feature importance of the XGB model, which exhibited the highest performance, is visualized in Fig. 4.

Discussion
This study aimed to develop a non-invasive machine learning model for estimating PaCO 2 in critically ill children receiving mechanical ventilation.We focused on minimizing the difference between PaCO 2 and EtCO 2 through machine learning, recognizing the significant impact that dead space ventilation or perfusion status can have on their correlation [7,21,22].
In the study cohort, the correlation coefficient between EtCO 2 and PaCO 2 was 0.78 (P < 0.001), with an R 2 value of 0.58.This was in with previous reporting R 2 values ranging from 0.4 to 0.9, depending on disease entities and ventilator modes [3,4,23].
Cross-sectional studies and clinical trials conducted in both adults and pediatric patients have investigated factors influencing the difference between EtCO 2 and [23] observed a discrepancy of 6.8 mmHg with a standard deviation of 6.4 mmHg, while Wang et al. [3] reported an even smaller discrepancy of 1.86 ± 7.42 mmHg in patients undergoing synchronized intermittent mandatory ventilation.In contrast to these previous studies, our study included patients with a larger discrepancy between PaCO 2 and EtCO 2 of 19 (12−31) mmHg and a lower PaO 2 /FiO 2 of 119.4 (82.6−202.2) mmHg.Even though the median OSI was in accordance with mild to moderate ARDS, the proportion of the group with OSI ≥ 12 was considerable.In our cohort, severely ill children with OSI ≥ 12 were observed more frequently at a higher discrepancy between EtCO 2 and PaCO 2, especially in hypercapnic groups.
Additionally, PaCO 2 levels in previous studies were reported to be lower than those observed in this study.Razi et al. [4] reported high accuracy (R 2 > 0.8) for patients with a mean of PaCO 2 of 45.8 ± 17.1 mmHg in synchronized intermittent mandatory ventilation mode.In our study, the PaCO 2 was hypercapnic at 63 (50−83) mmHg, with an OSI of 7.2 (4.4−8.6).As respiratory failure and significant dead space are relatively common in critically ill patients, the performance of the models estimating PaCO 2 may depend on their reliability in hypercapnic and hypoxemic patient groups.Utilizing artificial intelligence, we built more customized estimations compared to the traditional risk stratification methods [13][14][15].The XGB model achieved the highest R 2 value of 0.877 among the three models, all of which demonstrated better correlations with PaCO 2 than EtCO 2 , even in patients with hypercapnia and hypoxemia.
Nevertheless, several limitations need to be acknowledged.Firstly, utilizing PaO 2 as a cutoff to identify arterial blood in blood gas analysis might have led to the inclusion of venous blood results and potential exclusion of true arterial blood samples.This limitation is inherent to retrospective studies and arises due to the lack of explicit and clear labeling.Secondly, this study was conducted within a single-institution PICU.Consequently, external validation by other institutions or utilizing different measurement equipment was performed.Thirdly, despite separating training and testing data for each patient, repeated sampling from the individuals might have resulted in overfitting.Further studies with external validation and prospective cohorts are required.At last, given that our models were developed based on data retrieved in a 1-hour window, future analyses based on real-time data would prompt the clinical application of the developed model.
One of the key elements influencing the difference between EtCO 2 and PaCO 2 is the heterogeneous distribution of ventilation-perfusion mismatch among alveoli [21].Including capnographic waveforms as a variable is expected to improve model performance [22].At last, although interpreting relevant parameters is crucial for understanding gas exchange and dead space ventilation, the "black box" nature of machine learning makes it difficult to access detailed information regarding the developed model's mechanisms.Regarding this limitation, we tried to incorporate an explanation using SHAP values; however, it remains still challenging to provide an intuitive interpretability.
In conclusion, we successfully developed a machine learning-based model capable of non-invasively estimating PaCO 2 in critically ill pediatric patients receiving mechanical ventilation.The model demonstrated acceptable performance; however, external validation has not yet been performed, and further improvement in performance through follow-up studies is promising.

Fig. 1
Fig. 1 Flowchart of the study cohort used for model development and validation.PaO 2 : arterial partial pressure of oxygen; ECMO: extracorporeal membrane oxygenation

Fig. 2
Fig. 2 Performance and bias of the machine learning models.The Bland-Altman analysis for the linear regression model, the multi-layer perceptron (MLP) model, and the extreme gradient boosting (XGB) model are shown in (A), (B), and (C), respectively.The scatter plots for the linear regression model, the MLP model, and the XGB model are exhibited in (D), (E), and (F), respectively.CO2: carbon dioxide

Fig. 4 Fig. 3
Fig. 4 The feature importance in the XGB model.The relative importance of each feature was measured by Shapley additive explanations values.etco2: end-tidal carbon dioxide; log_fio2: log-transformed fraction of inspired oxygen; wt: weight; peep: positive end-expiratory pressure; ht: height; map: mean airway pressure, rr: respiratory rate; spo2: peripheral oxygen saturation; ve: minute ventilation; p_dbp: percentile of diastolic blood pressure; pip: peak inspiratory pressure; hr: heart rate; p_sbp: percentile of systolic blood pressure; |SHAP|: the absolute values of Shapley additive explanations; SHAP: Shapley additive explanations 2 measured during mechanical ventilation were excluded.

Table 1
Demographics and characteristics of patients

Table 2
The correlation between the estimated CO 2 and the measured PaCO 2